multiview neural surface reconstruction
Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance
In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail.
Review for NeurIPS paper: Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance
Clarity: As written, I find the explanation of the "implicit differentiable renderer" to be highly misleading. Throughout the paper, the network M is described as a "differentiable renderer" that accounts for both BRDF and lighting conditions. Indeed, the TITLE of the paper implies that lighting and materials are being recovered in the style of full inverse rendering. Section 3.2 introduces the rendering equation and makes a big deal about specifying the BRDF and light sources. However, this is all rendered pointless by lines 155-156: "Replacing M0 with a (sufficiently large) MLP approximation M provides the radiance approximation…" Rolling up all these factors into one big function means you are learning a surface light field, nothing more.
Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance
In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail.